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. Abstract. The spatial distribution of galaxies we observed is subject to the given 

I condition that we, human beings are sitting right in a galaxy - the Milky Way. Thus 

^ ' the ergodicity assumption is questionable in interpretation of the observed galaxy 

^ ^, distribution. The resultant difference between observed statistics (volume average) and 



the true cosmic value (ensemble average) is termed as the ergodicity bias. We perform 



$H ■ explicit numerical investigation of the effect for a set of galaxy survey depths and near- 

C/3 , end distance cuts. It is found that the ergodicity bias in observed two- and three-point 

^ ' correlation functions in most cases is insignificant for modern analysis of samples from 

galaxy surveys and thus close a loophole in precision cosmology. However, it may 
I become non-negligible in certain circumstances, such as those applications involving 

' three-point correlation function at large scales of local galaxy samples. Thus one is 

, reminded to take extra care in galaxy sample construction and interpretation of the 

statistics of the sample, especially when the characteristic redshift is low. 
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1. Introduction 

One key support to the Cosmological Principle is the observed near-isotropy of the 
' cosmic microwave background radiation and the angular distribution of galaxies. But 
isotropy alone does not prove homogeneity, the crucial link from isotropy to homogeneity 
is the Copernican Principle, which asserts that we are not privileged observer sitting in 
a special place in the Universe. 

Then there is the ergodicity assumption which states that by averaging over 
sufficiently large volume the measured statistics (volume average) is equivalent to the 
statistics on ensemble average. It is with the Cosmological Principle and the ergodicity 
assumption that we believe for any galaxy survey, as long as its effective volume is 
sufficiently large so that the cosmic variance can be ignored, the resulted sample is a 
fair representation of the Universe lU |2l |3] . 

It is true that there are no proper reasons to resurge the specialty of human beings 
in the modern cosmology, although there are works claiming we are in the center of a 
giant local void (e.g. [4J). Nevertheless, strictly speaking, the validity of the ergodicity 
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still requires averaging over the observer positions to avoid possible selection bias. 
Unfortunately, in reality, we are only able to observe the galaxy distribution from the 
Milky Way. We are not statistically different to those observers in other galaxies than 
the Milky way, but we are different to those observers not residing in any galaxy. The 
distribution of galaxies we observed shall be interpreted as the distribution of neighbors 
to us. This point is a mathematical one rather than philosophy. Namely, we have to 
evoke the conditional statistics given then already existence of the Milky Way in which 
we live to interpret the observed galaxy distribution, instead of the unconditional ones 
in compliance with the Copernican principle and the ergodicity assumption. 

For this reason, we call the difference between the volume averaged galaxy 
distribution observed by us and the ensemble average as the ergodicity bias. This 
is a previously unknown loophole in precision cosmology and galaxy statistics. The 
statistical tools to deal with it turns out to be the conditional statistics, which are 
actually all there in the classical textbook of [1]. We will see in the following sections 
that the change to the way of thinking brings interesting conclusions about the measured 
galaxy number density, two-point correlation function (2PCF) and the three-point 
correlation function (3PCF). 

The idea is not completely new, concern about the fairness of sample is repeatedly 
expressed in the book of [T], in which there is the clear recognition that the accidental 
perfect galaxy number counts Hubble achieved is partly resulted from "substantial 
excess of bright galaxies due to the local concentration in and around the Virgo cluster" 
(p. 5 of P). But to our knowledge, the paper presented here is the first to explicitly 
address and numerically evaluate the ergodicity bias. And we do find that the ergodicity 
bias is negligible in most cases and thus close a loophole in modern cosmology and galaxy 
statistics. However, in some cases especially when the characteristic redshift is low, one 
may need to take extra care of this ergodicity bias. 

2. Distribution of galaxies as we observed 
2.1. Number density 

To see how it comes, first let us check the spatial number density of galaxies. Let ng{r) 
denotes the local number density of galaxies at position r, then there are two averages: 
the ensemble average (ng(r))e and the spatial average {ng{r))n over sample space TZ. 

By the Cosmological Principle the ensemble average {ng{r))f, = no is a constant 
everywhere, while the spatial average 



is not, may depend on the position and the shape of the sample. The ergodicity 
assumption then just makes the two equal if the sample space TZ is large enough to 
suppress cosmic variance, no matter the big volume is achieved by depth increment or 
sky coverage enlargement. 




(1) 
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But a fact is that as we are already in a galaxy, the isotropic radial number density 
of objects at distance r to us is a conditional number density and is expected to be 



where ^cg is the two-point cross correlation function between the Galaxy and sample 
galaxies. Then the measured mean number density for the sample defined by distance 
limit [rinin,rmax] and sky coverage of inf steradians is 



It is very clear that this introduces a systematic bias ascribed to the long range 
correlation between the Galaxy with other galaxies, simply improving the sky coverage 
can not alleviate the bias. 

The integration f^^.""^ icgf^dr in generally is not zero except for rmin = 0, rmax = oo 
or well designed pair of distance cuts to force a zero provided that the ^cg is known 
already at any desired distance in advance. However it is impossible to push Vmax to 
infinity or always have the luck to meet with the right pair of distance cuts. The point 
is that no matter how deep or large the sample could be, there is the general non-zero 
systematics of hq — hq regardless how small it is, our spatially averaged mean number 
density does not equal to the ensemble average, though asymptotically approaches, i.e. 
there is the ergodicity bias. 

As stated in Eq. [2] the modulation to the local galaxy number density depends on 
^Gg, which calls for caution in taking local galaxy samples for distance-number counts 
related statistics, e.g. the luminosity function: redshift gradient resulted from fig is in 
fact incorporated into the evolution of the luminosity function along redshift unnoticed 
during estimation. 

For type classification based statistical functions, there is an additional complication 
that the C,Gg for one class of galaxies might be very different with that for another class. 
Furthermore it has been detected that color of galaxies, e.g. g — r, is strongly correlated 
even galaxies are at separation upto scales as large as ~ 20/i~^Mpc [6j, it is highly 
possible that samples of local galaxies with z <~ 0.007 is biased more or less in color. 

2.2. Two-point correlation function 

For an observer randomly placed in the Universe, the probability of finding a pair of 
galaxies in two volume elements at positions ri and r2 on ensemble average is related 
to the two-point correlation function (2PCF) through 



with = |ri — r2|. It is this function that we aim at measuring, and shall be equal 
to the estimated which is defined through our observed possibility of finding pair of 
galaxies 




(2) 





dP2 (x[l + ^g{ri2)] dridr2 , 



(4) 




(5) 
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Figure 1. Ergodicity biases in 2/3PCFs of samples with different distance cuts. 
Those almost horizontal lines are and AC given by Eq. [TT] and Eq. [15] respectively 
provided that bcg = 1, to the left ends of which are pairs of numbers labeling distance 
cuts (rmin, ''max) of hypothetical samples. The largest scale at which estimation of 
2/3PCF is robust is chosen to be (rmax — ''min)/2. Dashed lines refer to negative value. 
Left panel is of 2PCF while the right panel displays the case of the 3PCF of equilateral 
configuration C(''i2 = ''23 = ''31 ~ r). The top solid curve in the left plot is the linear 
2PCF at z = derived from the power spectrum provided by CMBFAST [7] with 
parameters fi™ = 0.27, fib — 0.046, Q\ = 0.73, erg ~ 0.9, n = 1, and the ( in the right 
plot is the prediction of the Eulerian perturbation theory at tree- level [8] . The dotted 
lines annotated with "Zwicky" approximates the Zwicky catalogue which characteristic 
depth is 47.2ft,^^Mpc, while dotted lines coincident with lines of (30, 180) but marked 
with "S-W" mimic the Shane- Wirtanen catalogue of characteristic depth 209ft-~^Mpc 
[9], note that the two dotted lines in the right panel are actually — A^. 

However, since we are the observer not randomly located but in a galaxy as an 
object in the Universe, the observed probability of finding a pair of objects is in fact 
conditional to the object at origin point and shall be a three-points problem (see p. 173 
of [U), 

dP^°^ (x[l + Ccgin) + CGg{r2) + ^9(^12) + Ccggin, Ts, ri2)]rfrirfr2 (6) 

in which (cgg is the three-point cross correlation function. The 2PCF we observed 
before averaging over TZ from galaxy sample evidently in principle is not the one in 
Eq. m anymore but 

4(ri, r2) = Cg{ri2) + Ccgin) + ^03(^2) + CGggin, r2, ru) , (7) 
which can only be a good approximation to the targeted when ^Ggi^i) + iGgij'2) + 
CGgg{.fi^^2^fi2) ^ igij'i2)- This, in together with the fact that ^cg decreases with 
increasing distance and keeps positive before zero-crossing, immediately lets an amusing 
conclusion that galaxies close to us, on average, are clustered more strongly than distant 
galaxies even if there are no evolutions resulted from gravitation force and galaxy bias 
function. 
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The measured 2PCF is actually averaged over the sample space TZ 

where is the Dirac delta function. It is difficult to provide exact figures about 
the errors by Eq. [S] before we acquire knowledge of the cross-correlation function at 
two- and three-point level between the Galaxy and those observational selected sample 
galaxies. Since in practice galaxy samples' near-end distance limits are usually greater 
than ~ lO/i-^Mpc, the regime in consideration is fairly linear, we can comfortably 
assume that the bias of the Galaxy-galaxy cross-correlation functions to the dark matter 
correlation functions is scale independent and linear, so that 

with bg being the bias of the sample galaxies to the dark matter and bcg being the bias 
of the Galaxy-galaxy correlation to the 2PCF of dark matter. In the weakly nonlinear 
regime C ~ while ^ < 1 and 6 ~ 1 - 3 (e.g. Ha [IH US |13]), the 3PCF term 
Ccgg ^ ^Gg a^d goes to zero much faster than ^ as scales increases, which thus can be 
ignored. Furthermore, there is a 1 2 symmetry in Eq. |8l we then have 



_ nk lTZ^Gg{ri)SD{\ri - -n2)dridr2 



(10) 



IiiIn^D{\ri-r2\~ri2)dridr2 

We define a new function e(ri, ri2) = Jn (5D(|ri — — ri2)(ir2/47rr^2) which is the fraction 
of surface area inside TZ of the sphere centered at ri with radius ri2- If the survey 
volume is sufficiently large that the boundary effect is negligible, e = 1. In general, 
e depends on the survey geometry and can only be evaluated numerically. However, 
under the limit of full sky coverage, the analytical expression of e can be easily derived 
and e(ri,ri2) = e(ri,ri2). We then use this approximation 

^ JrZ7^Gg{rMn,n2)rldn 

/;-;;e(ri,ri2)rfrfri 

to evaluate the ergodicity bias. 

Several numerical examples are demonstrated in Figure [1], the general trend of 
is that it trails off when r^am and rmax — ''"min increases, cases in exception may 
occur when the zero-crossing scale of 2PCF is between and rmax- (1) In the limit 
that ri2 ^ (?"max — ''^min)/2, 6(^1, ri2) — 1 for most ri in the survey volume, thus 
is not sensitive to and to a good extent ~ Q^Qg J^^.""^ ^r'^dr / {r'^^^ — rj^jn). As 
iGgij)'^'^dr = and ^cg changes from positive to negative from small to large scales, 
jrmax ^^2^j, (and thus A^) can deviate significantly from zero for some configurations of 
['^min, '"max]- However, the condition ri2 <^ (?"max — '"min)/2 often means ri2 is small, ^(^12) 
is large and thus A^ ^ ^{''^12) ■ (2) It looks that when the characteristic redshifts are 
low and ru ~ (?"max — '''min)/2, e(ri, ri2) can considerably deviate from unity for many ri 
in the survey volume and both A^ and A^/^ could become significant, but in this case 
the cosmic variance often overwhelms the ergodicity bias. (3) For deep surveys with 
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'^min ^ ^c, the ergodicity bias vanishes since ~ '^^i^min) 0, where — 120/i~^Mpc 
is the zero point of the correlation function (^(r^) = 0). Thus it seems unhkely that the 
ergodicity bias can be significant in practical means. 

2. 3. Three-point correlation function 

Similarly the observed probability of finding a triplet of galaxies is conditional to the 
Milk way and becomes a four-point problem 

dPP (X [1 + ^Ggin) + ^Gg{r2) + ^0,(^3) + ^,(^12) + ^,(^23) + U^Sl) 

+ ^Gg{n)^g{r23) + ^Gg{r2)Ursi) + ^Gg{rsKg{ru) (12) 
+ Ccggiri, ri2, ra) + Ccggl^^l, '^Sl, ^^s) + CGggir2, r23, Tg) + (5(^^12, ?^23, ^^31) 
+ VGgggin, ?^2, ^3, ri2, r23, r3i)] rfri(ir2(ir3 

in which rjcggg is the four-point cross-correlation function. The 3PCF we have is 
practically estimated via 

Cg = X- 4(n2) - 4(^23) - igirsi) ' 1 , (13) 

where X denotes the average of all those terms inside square brackets in Eq. [13] over 
sample space TZ. Substituting Eq. M for ^ then yields 

Cg = Cg+ {^Gg)n [^,(^23) + ^3(^31) + ^9(^12) " 3] + {VGggg)n ■ (14) 

The ergodicity bias in the 3PCF is apparently much more difficult to analyze than the 
2PCF due to its complex configuration dependence. Nevertheless, if working on large 
scales only where <^ 1, those higher order terms can be neglected in Eq. [TH and 
dominant contribution just comes from the term —3{^Gg)n- As an order of magnitude 
estimation, the ergodicity bias in the 3PCF at large scales is therefore roughly 

ACp = 4 - =^ -3A^,/2 . (15) 

It is known 3PCF approaches zero much faster than 2PCF when scale increases, the 
systematical bias identified here have much stronger effects to the third order statistical 
functions, which is obvious in the right panel of Fig. [H Furthermore as in most cases 
A^g > for local galaxy samples, the ergodicity bias in 3PCF effectively behaves like 
a negative nonlinear galaxy bias parameter 62 [E]; which imposes serious questions on 
the reliability of the nonlinear galaxy bias parameters estimated through 3PCF of local 
galaxy samples and henceforth other related results. 

3. Discussion 

Here it is argued that by changing the point of view to that the observed distribution 
of galaxies in the Universe is the distribution of neighbors to our Galaxy, statistics 
of the distribution are conceptually very different to what we used to think of, though 
numerically the resulting ergodicity bias might be small for most of practical applications 
especially when the galaxy sample is sufficiently far away from us and very deep. Note 
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that it has been assumed the correlation function between the Milk way and other 
galaxies follows the ensemble average ^cg and (cgg, in reality the true correlation strength 
could have large deviation to the mean since our Galaxy is located on the outskirts of 
a large cluster, exact numerical effects have to be explored carefully perhaps with the 
help of numerical simulations. 

Here we briefly discuss the impact of the ergodicity bias on precision cosmology. 
(1) The baryonic acoustic oscillation (BAO) cosmology, which relies on the correlation 
measurement at ri2 — lOO/i^^ Mpc. Mean redshifts of galaxy samples constructed for 
BAO detection in general are at 2; ~ 0.2 or higher (e.g. |15J) and thus r^i^i ^ Vc- We then 
expect the ergodicity bias to have little numerical influence on the BAO detection. (2) 
The primordial non-Gaussianity study through the galaxy power spectrum [16], llTl [H] 
and bispectrum (e.g. |19]) at scales even larger than 100/i~^Mpc . From Fig. [H we can 
conclude that the ergodicity bias certainly bias their results. Precision measurements 
of the primordial non-Gaussianity require larger survey depth than we have numerically 
evaluated, for which the induced bias is unlikely significant, but may still be non- 
negligible. Especially, the method proposed by [18] eliminates the cosmic variance in 
the power spectrum measurement by taking the ratio of the power spectra of different 
tracers. Since taking ratio does not eliminate the additive ergodicity bias, its relative 
impact is enhanced. Robust evaluation of the ergodicity bias in this case requires careful 
treatment of survey boundary, selection function and the intrinsic evolution of galaxy 
number density and clustering. We leave this detailed calculation elsewhere. 

In this short report only the impact on the spatial distribution of galaxies is 
discussed as examples, there are possibly many other aspects of statistical analysis 
of galaxy samples in needs of similar conceptual adjustment. For instance the peculiar 
velocity of galaxy we measured is actually the relative peculiar velocity of the galaxy 
to our Galaxy, and the peculiar velocities of galaxies are correlated with the peculiar 
velocity of the Milky Way. 

We must address that we are not challenging the Copernican Principle and the 
Cosmological Principle here, but rather simply point out an observational effect. If 
there were observers who are randomly placed in the Universe, they will have the same 
conclusion as ours about the sample provided by us. And the last thing we want to 
make clear is that the correlation between the Galaxy and other galaxies is not caused 
by our Galaxy, but is inherited from the intrinsic correlation in the underlying dark 
matter distribution and the roughly synchronous evolution of these galaxies. 
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